Approximation Algorithms for Hamming Clustering Problems
نویسندگان
چکیده
We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by %. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S. First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomialtime solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any > 0 it is NP-hard to split S into at most pk1/7− clusters whose Hamming diameter doesn’t exceed the p-diameter. Furthermore, we note that by adapting Gonzalez’ farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2kn-time (1 + ε)approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and % = O(log(k+n)). Finally, we show how to find in O(( ε + kn log n + k log n)(2k)) time a set L of O(p log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε)%, for any constant 0 < ε < 1.
منابع مشابه
Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances
We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of n input d-dimensional vectors under Hamming, `1, `2 and `∞ distances. All our algorithms run in O(logn) rounds of MPC for any fixed d and achieve (1 + )-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor ...
متن کاملObliviously Approximating Sequence Distances
There are several applications for schemes which approximately nd the distance between two sequences in a way that isòblivious' of one of the sequences up until a nal sublinear number of comparisons. This paper shows how sequences can be preprocessed obliviously to give a binary string, so that a simple vector distance between two bitstrings gives an approximation to a sequence distance of inte...
متن کاملHamming Approximation of NP Witnesses
Given a satisfiable 3-SAT formula, how hard is it to find an assignment to the variables that has Hamming distance at most n/2 to a satisfying assignment? More generally, consider any polynomial-time verifier for any NP-complete language. A d(n)-Hammingapproximation algorithm for the verifier is one that, given any member x of the language, outputs in polynomial time a string a with Hamming dis...
متن کاملSubmodular Hamming Metrics
We show that there is a largely unexplored class of functions (positive polymatroids) that can define proper discrete metrics over pairs of binary vectors and that are fairly tractable to optimize over. By exploiting submodularity, we are able to give hardness results and approximation algorithms for optimizing over such metrics. Additionally, we demonstrate empirically the effectiveness of the...
متن کاملL p - Testing Draft full
We initiate a systematic study of sublinear algorithms for approximately testing properties of realvalued data with respect to Lp distances for p ≥ 1. Such algorithms distinguish datasets which either have (or are close to having) a certain property from datasets which are far from having it with respect to Lp distance. For applications involving noisy real-valued data, using Lp distances allow...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Discrete Algorithms
دوره 2 شماره
صفحات -
تاریخ انتشار 2000